A Topic Coverage Approach to Evaluation of Topic Models
نویسندگان
چکیده
Topic models are widely used unsupervised capable of learning topics - weighted lists words and documents from large collections text documents. When topic for discovery in collections, a question that arises naturally is how well the model-induced correspond to interest analyst. In this paper we revisit extend so far neglected approach model evaluation based on measuring coverage computationally matching with set reference expected uncover. The suited analyzing models' performance large-scale analysis both measures quality. We propose new evaluate, series experiments, different types two distinct domains which exists. experiments include quality, categories, relationship between other methods evaluation. contributes supervised measure coverage, first coverage. achieves accuracy close human agreement. correlates highly one (Spearman's $\rho \geq 0.95$). Other contributions insights into evaluation, datasets code facilitating future research
منابع مشابه
A network approach to topic models
Martin Gerlach, 2 Tiago P. Peixoto, 4 and Eduardo G. Altmann 2 Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA Max Planck Institute for the Physics of Complex Systems, D-01187 Dresden, Germany Department of Mathematical Sciences and Centre for Networks and Collective Behaviour, University of Bath, Claverton Down, Bath BA2 7AY, United Kingdom I...
متن کاملA Generic Approach to Topic Models
This article contributes a generic model of topic models. To define the problem space, general characteristics for this class of models are derived, which give rise to a representation of topic models as “mixture networks”, a domainspecific compact alternative to Bayesian networks. Besides illustrating the interconnection of mixtures in topic models, the benefit of this representation is its st...
متن کاملRobust Evaluation of Topic Models
Statistical topic models such as latent Dirichlet allocation (LDA) have become enormously popular in the past decade, with dozens of extensions being proposed each year in conferences such as NIPS, ICML, KDD, EMNLP, and others. Test set perplexity is frequently the method of choice used in these papers for comparing new models with older variants, yet relatively little attention has been paid (...
متن کاملExternal Evaluation of Topic Models
Topic models can learn topics that are highly interpretable, semantically-coherent and can be used similarly to subject headings. But sometimes learned topics are lists of words that do not convey much useful information. We propose models that score the usefulness of topics, including a model that computes a score based on pointwise mutual information (PMI) of pairs of words in a topic. Our PM...
متن کاملThe Sensitivity of Topic Coherence Evaluation to Topic Cardinality
When evaluating the quality of topics generated by a topic model, the convention is to score topic coherence — either manually or automatically — using the top-N topic words. This hyper-parameter N , or the cardinality of the topic, is often overlooked and selected arbitrarily. In this paper, we investigate the impact of this cardinality hyper-parameter on topic coherence evaluation. For two au...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2021
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2021.3109425